Distributed approximate KNN Graph construction for high dimensional Data

نویسندگان

  • Mohamed Riadh Trad
  • Alexis Joly
  • Nozha Boujemaa
چکیده

La construction des graphes de plus proches voisins est un problème crucial pour nombre d’applications, notamment celles impliquant des algorithmes d’apprentissage et de fouille de données. Bien qu’il existe certain travaux visant à résoudre le problème dans des environnements centralisés, ils en restent néanmoins limités en raison du volume croissant des données ainsi que leur dimensionalité. Dans cet article, nous proposons une méthode basée sur des fonctions de hachage pour la construction des graphes de plus proches voisins. La méthode proposée est distribuable et scalable, aussi bien en volume qu’en dimensionalité. Par ailleurs, l’utilisation d’une nouvelle famille de fonctions de hachage, RMMH, garantit l’équilibe des charges en environnements parallèles et distribués.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed computation of the knn graph for large high-dimensional point sets

High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for...

متن کامل

Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dnt) time for high dimensional data (large d). The exponent t ∈ (1,2) is an increasing function of an intern...

متن کامل

EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph

Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...

متن کامل

Fast kNN Graph Construction with Locality Sensitive Hashing

The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graphbased learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient a...

متن کامل

Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases

In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012